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1 Introduction 

Assume we want to sample from a sequence of probability distributions {^ n } n( zj^ defined 
on a common measurable space E where J\f = {0, 1, . . . ,p} or J\f = N. As a special case, 
one can set 7r„ = it for all n e JV. Alternatively the distribution can vary across A/". 
Similarly to simulated annealing, one could be interested in the sequence of distributions 
7r„ (dx) oc 7r 7 " (x) dx for an increasing schedule {7n} n& /v so as t° maximize 7r or 7r„ could 
be the posterior distribution of a parameter given the data collected till time n. In this 
paper, we are interested in sampling this sequence of distributions sequentially; that is 
first sampling from n then 7Ti and so on. We will further on refer to n as the time index. 

The tools favoured by statisticians to achieve this are Markov Chain Monte Carlo 
(MCMC) methods; see for example (Robert and Casella, 1999). To sample from w n , 
MCMC methods consists of building an crgodic Markov kernel K n with invariant dis- 
tribution 7r„ using Metropolis-Hastings (MH) steps, Gibbs steps etc. MCMC have been 
successfully used in many applications in statistics and physics. When the distribution to 
sample is multimodal, MCMC samplers can be easily stucked in one mode. A standard 
approach to improve mixing consists of using interacting parallel MCMC/tempering mech- 
anisms where one runs a MCMC chain on an extended space E N with a specified joint 
invariant distribution admitting 7r„ as a marginal (Geyer and Thompson, 1995). However, 
MCMC are not well adapted to sequential simulation. At index n, one needs to wait for 
the Markov chain with kernel K n to reach its stationary distribution w n . 

We propose here a different approach to sample from {^ n } n£ j^- Our approach is based 
on Sequential Monte Carlo (SMC) methods (Doucet et al, 2001; Liu, 2001). Henceforth 
the resulting algorithms will be called SMC samplers. SMC methods have been recently 
studied and used extensively in the context of sequential Bayesian inference and physics 
(Doucet et al, 2001; Iba, 2001; Liu, 2001). At a given time n, the basic idea is to 
obtain a large collection of A (N ;§> 1) random samples j X$ \ named particles 

I J i=l,...,JV 
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whose marginal distribution is asymptotically (TV — > oo) equal to n n . These particles are 
carried forward over time using a combination of Sequential Importance Sampling (SIS) 
and resampling ideas. This approach is very different from parallel MCMC algorithms 
where one builds a Markov kernel with a specified joint invariant distribution on E N . 

Standard SMC algorithms available in the literature do not apply to our problem. 
Indeed, these algorithms deal with the case where each target distribution of interest 7r„ is 
defined on E n with £7„_i C E n . In (Chopin, 2002), a SMC algorithm is proposed to deal 
with the case E n = E. However, this approach restricts severely the way particles can 
explore the space. The idea in this paper is different and consists of building an artificial 
sequence of distributions {^ n } ne j^ defined on E n = E n+1 with n n admitting a marginal 
7r„. We are then back to the standard SMC framework. More precisely, 7r„ is defined on 
E n+1 by 7Tq (dxo) = 7r (dxo) and 

n 

n n {d(x Q , . . . ,x n )) = 7r„ (dx n ) Y[ L k (x k ,dx k -i) (1) 

k=i 

where {^ri} rieJ v'\{o} i s a sequence of auxiliary Markov transition kernels. 

Our approach has some connections with Annealed Importance Sampling (AIS) (Neal, 
2001) and the algorithm recently proposed in (Cappe et at, 2002) which are detailed in 
Section 2. However, the framework we present here is more general and allows to derive 
a whole class of principled integration and genetic-type optimization algorithms based 
on interacting particle systems. Similarly to MCMC, the efficiency of the algorithms is 
dependent on the target distributions, the proposal and auxiliary kernels. Nevertheless, 
generally speaking, one can expect that SMC samplers will outperform MCMC when 
the distributions to sample arc multimodal with well-separated modes. Moreover SMC 
samplers can be used for sequential Bayesian inference problems with static parameters 
like those addressed by Chopin (2002). 

This paper focuses on the algorithmic aspects of SMC samplers. However, it is worth 
noting that our algorithms can be interpreted as an interacting particle approximating 
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model of a nonlinear Feynman-Kac flow in distribution space. Under additional assump- 
tions, we provide a nonlinear Markov interpretation of the measure- valued dynamic system 
associated to the flow {^n} ne ^f- We show that this interpretation is a natural nonlinear 
version of the MH algorithm. Many convergence results are available for Feynman-Kac 
flows and their interacting particle approximations (Del Moral and Miclo, 2000; Del Moral 
and Miclo, 2001) and, consequently, for SMC samplers. However, the Feynman-Kac flow 
associated to SMC samplers is such that many known estimates on the asymptotic be- 
haviour of these interacting processes can be greatly improved. Several of these results 
can be found in (Del Moral and Doucet, 2003). 

The rest of the paper is organized as follows. In Section 2, we review a generic SMC 
algorithm to sample from the sequence of distributions (|l]). Various settings for this 
algorithm are presented, some extensions and the connections with previous work are 
outlined. Section 3 describes the distribution flow associated to our interacting particle 
approximating model and also presents an original nonlinear Markovian interpretation of 
this flow. Section 4 applies this class of algorithms to a nonlinear regression problem. 
Finally, we discuss briefly a few open methodological and theoretical problems in Section 
5. 

2 Sequential Monte Carlo Sampling 
2.1 A Generic Algorithm 

We describe here a generic SMC algorithm to sample from the sequence of distributions 
{ftnjncjy defined in ([!]) based on a Sampling Importance Resampling strategy; see (Doucet 
et at, 2001) for a booklength survey of the SMC literature. Alternative SMC algorithms 
such as the Auxiliary Particle method of Pitt and Shephard (1999) could also be used. 

Further on we will use the notation Xo-k to denote (Xo, . . . , X^). At time n— 1, assume 
a set of particles {^o-i-il = 1, • • ■ N) distributed approximately according to Tr„_i is 
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available, i.e. the empirical measure 

A 1 N 

7T„_i (dXO:n-l) = *vW (dXO:n-l) 

i=l 

is an approximation of 7T n -i. At time n, we extend the path of each particle according to 
a Markov^ kernel M n (x n -i,dx n ). The resulting path is thus approximately distributed 
according to 7r„_i (dxo-. n -i) M n (x n —i,dx n ). Importance sampling can then be used to 
correct for the discrepancy between the sampling distribution and n n (dxo :n ), with the 
importance weight satisfying 

r ( T t r \ - ^— ( - dx ° :n ^ (2) 

7r„_i {dx 0:n -i) M n (Xn-!, dx n ) 



7T n {dXyi) Lfi (^m dXn- 



7r„_i (dx n -i) M n (x n -i,dx n ) 
and being assumed well-defined. Finally, the particles are resampled according to their 
importance weights; particles with low weights are discarded whereas particles with high 
weights are multiplied. The resampled particles are given an equal weight. To sum up, 
the algorithm proceeds as follows. 

Sequential Monte Carlo Sampler 

Initialization; n = 0. 

Sampling step 

• For i = 1, N, sample ~ vq (•). 



• For i = 1, ...,N, evaluate the normalized weights Wq 



(i) 



i=l 



Resampling step 



• Multiply/Discard particles jjfg l H with respect to high/low weights |Wq^ j to obtain 
N particles [x^j. 



x The Markov assumption could be relaxed. 



Iteration n; n G Af\ {0}. 

Sampling step 

• For i = 1, AT, set X^ n _ x = X^ n _ x and sample X$ ~ M„ x , 

• For i = 1, ...,N, evaluate the normalized weights W,i 

AT 
i=l 

Resampling step 

• Multiply/Discard particles jx^j with respect to high/low weights jwi^ j to obtain 
N particles {x^}. 

In this algorithm, Vq is the initial importance distribution. The resampling step can be 
done using a standard procedure such as multinomial resampling (Gordon et at, 1993), 
stratified resampling (Kitagawa, 1996) or minimum entropy resampling (Crisan, 2001). 
All these resampling schemes are unbiased; that is the number of times Ni the particle 
X^ n is copied satisfies E (Ni) — NWn* 1 ■ MCMC steps with invariant distribution 7r„ can 
also be included after the resampling step (Gilks and Berzuini, 1999). 

The complexity of this algorithm is in O (N) and it can be parallelized easily. In 
practice, the memory requirements are in O (N) too and do not increase over time as one 
does not need to keep in memory at time n the whole paths |^o*n| but only 

The algorithm can be interpreted as an adaptive importance sampling strategy. Ini- 
tially, «o is used and the particles with the highest importance weights are multiplied 
whereas the ones with small weights are discarded. At time n, new "candidate" particles 
are sampled according to a proposal distribution kernel M n . If M n is a random walk, 
then the new particles can be interpreted as a local exploration of the distribution. The 
crucial point is that these candidates are weighted by (||) so as to ensure that after the 
resampling step their distribution is approximately 7r„. The introduction of the auxiliary 



kernel L n allows the use of importance sampling without having to compute the marginal 
distribution / 7r„_i (du) M n (u, dx) of the particles Indeed, this marginal impor- 

tance distribution does not typically admit an analytical expression except when M n is a 
MCMC kernel of invariant distribution 7r„_i. A similar idea is the basis of the conditional 
Monte Carlo method described by Hammersley (1956). 

2.2 Particle Estimates 

At time n, we have the following empirical approximations of n n before the resampling 
step 

N 

7r n ,i(da;) = 2w«^w (dx) . 

i=l 

and after the resampling step it is equal to 

1 N 

7?n,2 (dx) = jjY, 5 x m (dx) . 

»=1 

For any measure fj, and function /, we will denote ji (/) = J f (x) \i (dx). An estimate of 
7Tn (/) is given by 

r N ~ 

f(x)^ 1 (dx)=J2 W n ) f{ X ^)- 
J i=l 

or alternatively / / (x) 7r„^2 (da;) = jf / ^ which has higher variance. If ir n = tt, 

then the following estimate can be also used 

n N 
fe=l i=l 

Though the particles are statistically dependent, one can show under assumptions given 
in (Del Moral and Miclo, 2000) that this estimate is consistent as N — > oo. 

The algorithm described above can also be used to compute the ratio of normalizing 
constants. Indeed, typically the sequence of distributions n n (dx) is only known up to 
a normalizing constant, i.e. say 7r„ (dx) oc /„ (x) dx. In this case, the unnormalized 

importance weights one computes are equal to 

— , fn [Xn ) ) L n (xi l) , dX^_A 

W« = V L V L — oc w M 



at time n. It is possible to obtain an estimate of the ratio of the normalizing constants 

Z n / f n (x) dx 

Z n -1 J fn-l (x) dx 

using 

-I N . 

= ^E^ ( 4 ) 



Zn-l N . 

I— l 



Thus an estimate of log (Z n /Zo) is given by 

If the resampling scheme used is unbiased, then (|J) is also unbiased (Del Moral and Miclo, 
2000). 

2.3 Algorithm Settings 

The algorithm presented in the previous subsection is very general. There are many 
potential choices for {Tt n , M n , L n } n£j ^ leading to various integration and optimization 
algorithms. 

Homogeneous sequences. A simple choice consists of setting 7r„ = tt, M n = M and 
L n = L. In this case, the importance weight (0) is the following generalized MH ratio 

' 7T (da;) M (x, dx') ' 
the standard MH ratio corresponds to M = L. In this simple case, the particles evolve 
independently according to a proposal distribution M, their generalized MH ratio is com- 
puted and normalized. The particles are then multiplied or discarded with respect to the 
value of their normalized MH ratio. 

Sequence of distributions Tr n . It might be of interest to consider non homogeneous 
sequence of distributions either to move "smoothly" from ttq = vq to a target distribution 
7r through a sequence of intermediate distributions or for the sake of optimization. In the 
case of integration as suggested by Neal (2001), one can select 

7r„ (dx) oc 7T 7 ™ (ar) 7Tq~ 7 " (a;) dx 



with M = {0, . . . ,p}, 70 = and 7 p = 1. For the case of optimization, one can select 



resulting algorithm is a genetic algorithm where the sampling step is the "mutation" 
step and the resampling step is the selection step (Goldberg, 1989). However, there 
is a significant difference with standard genetic algorithms as we know the asymptotic 
(N — » oo) distribution of the particles. This makes the analysis of the resulting algorithm 
easier than in cases where this distribution is unknown such as in (Del Moral and Miclo, 
2003). Convergence properties of the algorithm are currently under study. 

Finally, another application of this algorithm consists of estimating the sequence of 
posterior distributions 7r„ (dx) — Tt n (dx\yi t ...,y n ) where y n is an observation available 
at time n. As briefly discussed in the introduction, SMC algorithms have been recently 
proposed in this framework by Chopin (2002) but the SIS framework used is somehow 
restricted: it only allows M n to be a MCMC kernel of invariant distribution 7T n _i. 

Sequence of proposal kernels M n and auxiliary kernels L n . Any couple of kernels 
can be used as long as the ratio ((H) is well defined. However, one can only expect good 
properties of the algorithm if this ratio admits a reasonable variance and also if L n is 
mixing. Indeed, loosely speaking, the faster L n mixes, the faster the SMC algorithm 
forgets Monte Carlo errors (Del Moral and Doucet, 2003). 

In SMC algorithms (Doucet et at, 2001), it is known that the importance sampling 
distribution minimizing the conditional variance of the weights at time n, i.e. {-^o*n— l f 



7r„ [dx) (x ■K ln (x) dx 



where N = N, {j n }. 



is an increasing sequence such that 7, 



n 



00. In this case, the 




fixed, is given by 



M n (x, dx') 



n n (dx 1 ) L n (x' , dx) 
J 7r„ (du) L n (u, dx) 



(6) 



In this case, the importance weight G n (x, x') is given by 



G n (x) 



J 7r„ (du) L n (u, dx) 



(7) 



7r n _i (dx) 
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and is independent of x' . This allows the resampling step to be performed before the 
sampling step. 

In standard applications of SMC algorithms, the kernel L n is usually given by the 
problem at hand whereas in our setup this kernel is arbitrary and can be optimized for 
a given proposal distribution M n . One can alternatively select the kernel L n so as to be 
able to compute ((?]); e.g. a MCMC kernel of invariant distribution ir n , and then sample 
the particles according to (^). 

For a fixed M n , an alternative natural choice^ consists of choosing 

7r„_i (dx')M n (x',dx) 
J 7r„_i (du) M n (u,dx) 

In this case, the associated importance weight G n (x,x') is given by 

G (x')- nn {dx>) (9) 

n[ >~ fn n ^(du)M n (u,dx'Y W 

If M n is a MCMC kernel of invariant distribution 7r„_i, then the weight (JsJ> can be 

computed easily. If not, numerical integration using the current set of particles can be 

used to approximate it but the resulting algorithms would be of complexity O {N 2 J . 

2.4 Connections to previous work and Extensions 

Connections to previous work. AIS is a method proposed recently by Neal (2001). Re- 
versing the time index in (Neal, 2001) to be consistent with our notation, AIS corresponds 
to the case where one considers a finite sequence of distributions, M n is a MCMC kernel 
of invariant distribution 7r„ and 

L n (x, dx') = M n (x', dx) nn ^ . (10) 

7T„ (dX) 

For a given M n , one can check that this choice of L n ensures that (||) is satisfied. In this 
case, one obtains by combining (JsJ) and ( |l0| ) 

7T„ (dx) 



Gn (x,x ) — G n (x) 



7T„_l (dx) ' 



2 thanks to C. Andrieu 



The resampling step is not included in the AIS algorithm. In our framework, we point 
out that this is a crucial step to include to make the method efficient as established 
theoretically in (Del Moral and Miclo, 2000) and practically in our simulations. Otherwise 
the method is just a special instance of SIS and collapses if n is too large. In (Godsill and 
Clapp, 2001), the authors used the AIS algorithm in combination with resampling in the 
context of optimal filtering. 

A more recent work (Cappe et at, 2002) contemporary of (Del Moral and Doucet, 
2003) and developed independently is another special case of our framework. In (Cappe 
et at, 2002), the authors consider the homogeneous case. Their algorithm corresponds 
to the case where M is an MCMC kernel of invariant distribution it (namely a Gibbs 
sampler) and L (x, dx') — n (dx r ), it follows that 

G(x x ')- n ^ 

This particular case has limited applications as G (x, x') would not be defined in most 
applications; e.g. tt (dx') — tt (x 1 ) dx 1 and M is an MH kernel. 

Extensions. The algorithm described in this section must be interpreted as the basic 
element of more complex algorithms. It is what the MH algorithm is to MCMC. For 
complex MCMC problems, one typically uses a combination of MH steps where the n x 
components of x say (xi, . . . ,x nx ) are updated by subblocks (Robert and Casella, 1999). 
Similarly, to sample from high dimensional distributions, a practical SMC sampler can 
update the components of x via subblocks. There are also numerous potential extensions: 

• It is straightforward to develop a version of the algorithm so as to sample distribu- 
tions defined on an union of subspaces of different dimensions. However, contrary 
to reversible jump MCMC algorithms (Green, 1995), no reversibility condition is 
needed. 

• As suggested in (Crisan and Doucet, 2000), one can use a proposal kernel whose 
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parameters are a function of the whole set of current particles. This allows the 
algorithm to automatically scale the proposal distribution based on the previous 
importance weights. 

In the general case, the sequence of probability distributions {Tt n } nE u- of interest 

is such that 7r„ is defined on E n and not on E. We can generalize the algorithm 

described in this section to this case. We introduce an auxiliary kernel L n from E n 

to E n —i and a proposal kernel M n from E n -\ to E n , At time n — 1, N particles 

approximately distributed according to 7r„_i are available. At time N new 

particles are sampled according to X$ ~ M n ^X^^, and the following 

importance weights are computed 

7T„ (dX^L n (x%\dX®^ 



ex 



7r n _! (dA^i) Mn (x^dXP 



Then the particles are resampled. 

3 Feynman-Kac Representation and Particle Interpretations 

In this Section, we show that the algorithm presented in Section ^ corresponds to an 
interacting particle approximation model of a nonlinear Feynman-Kac flow in distribution 
space. We provide an alternative nonlinear Markovian representation of this flow and 
its interacting particle approximation. Here the Feynman-Kac flow corresponds to the 
special case where the so-called potential function is given by the generalized Metropolis 
ratio The abstract description and the analysis of general Feynman-Kac flows and 
their particle approximations have been investigated in several recent research articles. 
Many asymptotic (n — * oo and/or N — > oo) results are available in this field including 
empirical process convergence, central limit theorems, large deviation principles as well 
as increasing propagation of chaos estimates and uniform convergence estimates with 
respect to the time parameter; all of which can be used for SMC samplers. The interested 
reader is referred to the survey article (Del Moral and Miclo, 2000) and the more recent 
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studies (Del Moral and Miclo, 2001; Del Moral and Miclo, 2003). As mentioned in the 
introduction, the particular choice of the potential function (Q) simplifies the analysis and 
many known estimates on the asymptotic behaviour of these interacting processes can be 
greatly improved. Several of these results can be found in (Del Moral and Doucet, 2003). 

3.1 Feynman-Kac Representation 

Define the following distributions on E 2 = E x E 

(7r„ x L n ) (d (as, as')) = n n (dx') L n (x' , dx) . 

Using (|l|) and (||), it is clear that the sequence of distributions {ir n x L n } neJ ^- (with the 
convention ttq x Lq = ttq x ttq) admits the following so-called Feynman-Kac representation 

(lTn X L n ) (f) = X n (/) /A„ (1) , 

with 

An (/) = E 

v ,{M k ] I / (X n -i,X n ) Go (Xq) JJ Gfe (Xtc-i, Xk) ] , 
where E„ AM k } denotes the expectation with respect to 

n 

vo (dxo) JJ M k {xk-i, dx k ) ■ 
fe=i 

This representation is at the core of the results given in (Del Moral and Miclo, 2000). 
We give now two "operator-like" interpretations of the sequence {(7r„ x L n )} n£j y. For a 
measure fi and a Markov kernel K, we use the standard notation 

fiK (A)= [ fi (dz) K (z, dz') . 
J A 

Let V (E 2 ) be the set of probability measures on E 2 . The mapping : V (E 2 ) -> V (E 2 ) 
is defined as 

7T„ x L n = ( (7T„_1 x L 

n— 1 )M n ) (11) 

where 

fJ. (Gn) 
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and M n is a Markov kernel on E 2 denned as 

M n {(u,v) ,d(u',v')) = 5 V (du')M n (u',dv r ) . 

Assuming that G„ can be upper bounded over E 2 , one can easily check that an alternative 
representation is given by 

(tt„ x L n ) = (tt„_i x L„_ 1 )M„S , fi (7rn _ iXin _ i)Sn (13) 

where 

S n ,p ((u, v) , d (u', «')) - eG„ (u, v) (d (u', v')) + (1 - eG (u, w)) *„ (m) (d (u', v')) , 

(14) 

e being chosen such that eG„ (u, w) < 1 over E 2 . 

The kernel MnS 1 ^ ixL ^ is a so-called nonlinear Markov kernel; i.e. the tran- 
sition kernel is dependent not only on the current state but also on its distribution. A 
generic nonlinear Markov chain {Z n } n>0 satisfies 

Z n ~ ^"n,Law(Z„_i) {Z n -1, ') • 

It is typically impossible to simulate a realization from such a Markov chain as the 
distribution of the state is not available. However, a particle approximation of it can 
be used. Consider a Markov chain {Z n } n>0 taking values in E 2 with transition ker- 
nel M n S , ixL . This kernel can be interpreted as follows. Given Z„_i = 
(U n -i,V n -i) ~ (7r„_i x L„_i), one first sample a candidate Z* n = (U*,V*) = (V n -i,V*) 
where V* ~ M n {U*, ■). With probability eG„ (Z7*, one sets Z n = {U*,V*), otherwise 
Z„ ~ (fan-i x L„_i) M„^ . By construction, one has Z n <~ (717, x L n ). This algorithm 
can be interpreted as a nonlinear version of the MH algorithm. The main difference being 
that, when a candidate is rejected, the chain does not stay where it is a new state is 
proposed according to ( (7T„_i x L„_i) M„). 



13 



3.2 Particle Interpretations 

The first particle interpretation of the flow follows ([ll])-([l2|). It corresponds to the stan- 
dard algorithm which has been described in Section ^. The second alternative algorithm 
corresponds to a particle interpretation of the flow corresponding to (|l^) . It proceeds as 

follows. 

Iteration n; n £ Af\ {0}. 

Sampling step 

• For i = 1, TV, set X$ n _ x = and sample A#> ~ M n (l^, •) . 

• For i = 1, N, evaluate the normalized weights Wn 

N 

W« cx G„ (X» 1( JW) , J] W$ = i. (is) 

t=i 

Resampling step 

• J = 0. 

• For i = l,...,iV, with probability eG„ (x^A^), set = A^ otherwise set 

J = JU {«}. 

• Multiply/Discard particles |Aq 1 ^ j with respect to high/low weights to obtain 

4 Simulation Results 
4.1 Model 

We consider the following harmonic regression model (Andrieu and Doucet, 1999) 

Y = D(lu)(3 + ri, 
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where Y = (y , . . .,y m -if, P = {Pi, ■ ■ ■ ,/3 2 fe) T , n = (n , . . .,n m _ 1 ) T , u = . . . ,uj k ) T e 
(0, n) k and D (ui) is a m x 2k matrix where for i = 0, . . . , m — 1, j = 1, . . . , k . 

\ D ( w )]i+i,2j-i = cos (Wji) , [D (w)] i+ i i2 j = sin (wji) . 

We assume that n| cr 2 ~ A/" (0, <7 2 / p ) and we use the following priorp (cr 2 , j3, ui) — p (w) p (/3| cr 2 ) p (a 2 ) 
with 

CT 2 ~:T<?(f,f), /3|a 2 ~Af(0,a 2 £o), 

where E^ 1 = 5~ 2 D T (w) D (w) (<5 2 = 25), vo = 7o = 1; p( w ) is uniform on Q = 
jw 6 (0, 7r) fe ; < wi < . . . < Wfc < 7r|. The posterior density satisfies on 

p(w\y) oc ( 7o + r T pr) 2 

with 

M- 1 = (l + <r 2 )£> T (w)D(w), 
to = MD T (w) Y, 
P = I p - D (w) MD T (lu) . 
We simulate a realization of m = 100 observations with fc = 6, cr 2 = 5, 

w = (0.08, 0.13, 0.21, 0.29, 0.35, 0.42) T , 

(3 = (1.24, 0.00, 1.23, 0.43, 0.67, 1.00, 1.11, 0.39, 1.31, 0.16, 1.28, 0.13) T . 
The posterior density is multimodal with well-separated modes. 

4.2 Algorithms 

To sample from tt (u>) = p ( ui \ Y) , we use an homogeneous SMC sampler with N — 1000 
particles where the k components are updated one-at-a-time using a simple Gaussian 
random walk proposal M of standard deviation cjrw- We select L to be equal to M 
and use the stratified resampling procedure. We compare our algorithm with a MCMC 
algorithm. The MCMC algorithm updates the component one-at-a-time using a MH 
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step with the proposal kernel M. In both case, the initial distribution is the uniform 
distribution on ft. 

We consider the case where cfrw = 0.1. Obviously one could come up with a better 
proposal kernel. We want to emphasize here that the SMC approach is more robust to 
a poor scaling of the proposal. A similar remark was made in (Cappe et al, 2002). In 
Figure 1 , we present the marginal posterior distributions of oj\ and 102 obtained using the 
SMC sampler with 100 iterations. We then run 12000 iterations of the MCMC algorithm 
so as the computational complexity to be roughly the same for the two algorithms. The 
MCMC algorithm is more sensitive to the initialization. On 50 realizations of the SMC 
and the MCMC algorithm, the SMC always explores the main mode whereas the MCMC 
algorithm converges towards it only 36 times. 

We also use an inhomogeneous version of the SMC sampler so as to optimize p (u\ Y). 
In this case the target density at time n is n n (w) oc p ln (ui\Y) with j n = n and we use 
50 iterations. We compare this algorithm to a simulated annealing version of the MH 
algorithm with 60000 iterations with j n = ra/1200. In Table 1, we display the mean 
and standard deviations of the log-posterior density of the posterior mode estimate; the 
posterior mode estimate being chosen as the sample generated during the simulation 
maximizing the posterior density. Contrary to the simulated annealing algorithm, the 
SMC algorithm converges consistently towards the same mode. 



Algorithm SMC MCMC 

Mean of the log-posterior values -326.12 -328.87 

Standard deviation of the 0.12 1.48 

log-posterior values 

Table 1: Performance of SMC and MCMC algorithm obtained over 50 simulations 
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Figure 1: Histograms of the simulated values of (u>i,u>2) using SMC - Estimation of 
p(u)i\Y) (top) and p{w%\Y) (bottom). 

5 Discussion 

In this article, we have presented a class of methods to sample from distributions known up 
to a normalizing constant. These methods are based on SMC algorithms. This framework 
is very general and flexible. Several points not discussed here are detailed in (Del Moral 
and Doucet, 2003). 

• In the homogeneous case, assume that we do not initialize the algorithm in the sta- 
tionary regime, i.e. we do not correct for the discrepancy between vq and ir. This 
has to be paralleled with MCMC algorithms which are not initialized in the station- 
ary regime. Under regularity assumptions, it can be shown that the distribution 
flow still converges towards the target distribution n. Moreover, it converges at a 
rate only dependent on the mixing properties of L. This is in contrast with the MH 
algorithm whose rate of convergence is dependent on n and M. 
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• The algorithm we have presented can be used to simulate a Markov chain with a 
fixed terminal point. Indeed, one obtains at time n samples from (|l|). By setting 
Lq (xi,dxa) — S x (dxo) and reversing the time index, one obtains an approximate 
realization of a Markov process of initial distribution 7r„ at time 0, transition {L n } 
and terminal point x at time n + 1. This has applications in genetics and physics. 

There are also several important open methodological and theoretical problems to 
study. 

• Similarly to MCMC methods, one needs to carefully design the various components 
of the algorithm to get good performance. In particular, it would be of interest to 
come up with an automated choice for L n given M n . For the homogeneous case, 
one could look at minimizing the variance of (||). It involves a tradeoff between the 
mixing properties of L and the variance of the importance weights (|J). This point 
is currently under study. 

• It would be interesting to weaken the assumptions of the results in (Del Moral and 
Miclo, 2000; Del Moral and Doucet, 2003) which mostly only hold for compact 
spaces. 
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